1,821 research outputs found

    Learning a Static Analyzer from Data

    Full text link
    To be practically useful, modern static analyzers must precisely model the effect of both, statements in the programming language as well as frameworks used by the program under analysis. While important, manually addressing these challenges is difficult for at least two reasons: (i) the effects on the overall analysis can be non-trivial, and (ii) as the size and complexity of modern libraries increase, so is the number of cases the analysis must handle. In this paper we present a new, automated approach for creating static analyzers: instead of manually providing the various inference rules of the analyzer, the key idea is to learn these rules from a dataset of programs. Our method consists of two ingredients: (i) a synthesis algorithm capable of learning a candidate analyzer from a given dataset, and (ii) a counter-example guided learning procedure which generates new programs beyond those in the initial dataset, critical for discovering corner cases and ensuring the learned analysis generalizes to unseen programs. We implemented and instantiated our approach to the task of learning JavaScript static analysis rules for a subset of points-to analysis and for allocation sites analysis. These are challenging yet important problems that have received significant research attention. We show that our approach is effective: our system automatically discovered practical and useful inference rules for many cases that are tricky to manually identify and are missed by state-of-the-art, manually tuned analyzers

    Automatic Induction of Classification Rules from Examples Using N-Prism

    Full text link
    www.dis.port.ac.uk/~bramerma One of the key technologies of data mining is the automatic induction of rules from examples, particularly the induction of classification rules. Most work in this field has concentrated on the generation of such rules in the intermediate form of decision trees. An alternative approach is to generate modular classification rules directly from the examples. This paper seeks to establish a revised form of the rule generation algorithm Prism as a credible candidate for use in the automatic induction of classification rules from examples in practical domains where noise may be present and where predicting the classification for previously unseen instances is the primary focus of attention

    Impact of Mandatory Diversity Training: Lessons from a Private University

    Get PDF
    Attendance at diversity training programs is often dictated by management, and participants find themselves caught between their genuine desire to broaden their understanding of the subject and resentment at being forced to do so. The outcomes of these mandatory training programs have not been systematically assessed. This study looks at the cognitive, attitudinal, and behavioral impacts of attending such a program and finds valuable lessons learned and cautious room for optimism

    On the predictability of domain-independent temporal planners

    Get PDF
    Temporal planning is a research discipline that addresses the problem of generating a totally or a partially ordered sequence of actions that transform the environment from some initial state to a desired goal state, while taking into account time constraints and actions' duration. For its ability to describe and address temporal constraints, temporal planning is of critical importance for a wide range of real-world applications. Predicting the performance of temporal planners can lead to significant improvements in the area, as planners can then be combined in order to boost the performance on a given set of problem instances. This paper investigates the predictability of the state-of-the-art temporal planners by introducing a new set of temporal-specific features and exploiting them for generating classification and regression empirical performance models (EPMs) of considered planners. EPMs are also tested with regard to their ability to select the most promising planner for efficiently solving a given temporal planning problem. Our extensive empirical analysis indicates that the introduced set of features allows to generate EPMs that can effectively perform algorithm selection, and the use of EPMs is therefore a promising direction for improving the state of the art of temporal planning, hence fostering the use of planning in real-world applications.</p

    Data mining via ILP: The application of progol to a

    Get PDF
    As far as this author is aware, this is the first paper to describe the application of Progol to enantioseparations. A scheme is proposed for data mining a relational database of published enantioseparations using Progol. The application of the scheme is described and a preliminary assessment of the usefulness of the resulting generalisations is made using their accuracy, size, ease of interpretation and chemical justification

    Semantic Context Forests for Learning-Based Knee Cartilage Segmentation in 3D MR Images

    Full text link
    The automatic segmentation of human knee cartilage from 3D MR images is a useful yet challenging task due to the thin sheet structure of the cartilage with diffuse boundaries and inhomogeneous intensities. In this paper, we present an iterative multi-class learning method to segment the femoral, tibial and patellar cartilage simultaneously, which effectively exploits the spatial contextual constraints between bone and cartilage, and also between different cartilages. First, based on the fact that the cartilage grows in only certain area of the corresponding bone surface, we extract the distance features of not only to the surface of the bone, but more informatively, to the densely registered anatomical landmarks on the bone surface. Second, we introduce a set of iterative discriminative classifiers that at each iteration, probability comparison features are constructed from the class confidence maps derived by previously learned classifiers. These features automatically embed the semantic context information between different cartilages of interest. Validated on a total of 176 volumes from the Osteoarthritis Initiative (OAI) dataset, the proposed approach demonstrates high robustness and accuracy of segmentation in comparison with existing state-of-the-art MR cartilage segmentation methods.Comment: MICCAI 2013: Workshop on Medical Computer Visio

    ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks

    Full text link
    Hash codes are efficient data representations for coping with the ever growing amounts of data. In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests, with near-optimal information-theoretic code aggregation among trees. We start with a simple hashing scheme, where random trees in a forest act as hashing functions by setting `1' for the visited tree leaf, and `0' for the rest. We show that traditional random forests fail to generate hashes that preserve the underlying similarity between the trees, rendering the random forests approach to hashing challenging. To address this, we propose to first randomly group arriving classes at each tree split node into two groups, obtaining a significantly simplified two-class classification problem, which can be handled using a light-weight CNN weak learner. Such random class grouping scheme enables code uniqueness by enforcing each class to share its code with different classes in different trees. A non-conventional low-rank loss is further adopted for the CNN weak learners to encourage code consistency by minimizing intra-class variations and maximizing inter-class distance for the two random class groups. Finally, we introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. The proposed approach significantly outperforms state-of-the-art hashing methods for image retrieval tasks on large-scale public datasets, while performing at the level of other state-of-the-art image classification techniques while utilizing a more compact and efficient scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of light-weight CNNs, instead of simply going deeper.Comment: Accepted to ECCV 201

    Determining appropriate approaches for using data in feature selection

    Get PDF
    Feature selection is increasingly important in data analysis and machine learning in big data era. However, how to use the data in feature selection, i.e. using either ALL or PART of a dataset, has become a serious and tricky issue. Whilst the conventional practice of using all the data in feature selection may lead to selection bias, using part of the data may, on the other hand, lead to underestimating the relevant features under some conditions. This paper investigates these two strategies systematically in terms of reliability and effectiveness, and then determines their suitability for datasets with different characteristics. The reliability is measured by the Average Tanimoto Index and the Inter-method Average Tanimoto Index, and the effectiveness is measured by the mean generalisation accuracy of classification. The computational experiments are carried out on ten real-world benchmark datasets and fourteen synthetic datasets. The synthetic datasets are generated with a pre-set number of relevant features and varied numbers of irrelevant features and instances, and added with different levels of noise. The results indicate that the PART approach is more effective in reducing the bias when the size of a dataset is small but starts to lose its advantage as the dataset size increases

    Neutron scattering in a d_{x^2-y^2}-wave superconductor with strong impurity scattering and Coulomb correlations

    Full text link
    We calculate the spin susceptibility at and below T_c for a d_{x^2-y^2}-wave superconductor with resonant impurity scattering and Coulomb correlations. Both the impurity scattering and the Coulomb correlations act to maintain peaks in the spin susceptibility, as a function of momentum, at the Brillouin zone edge. These peaks would otherwise be suppressed by the superconducting gap. The predicted amount of suppression of the spin susceptibility in the superconducting state compared to the normal state is in qualitative agreement with results from recent magnetic neutron scattering experiments on La_{1.86}Sr_{0.14}CuO_4 for momentum values at the zone edge and along the zone diagonal. The predicted peak widths in the superconducting state, however, are narrower than those in the normal state, a narrowing which has not been observed experimentally.Comment: 24 pages (12 tarred-compressed-uuencoded Postscript figures), REVTeX 3.0 with epsf macros, UCSBTH-94-1
    corecore